Distributed Caching Using the HTCondor CacheD

نویسندگان

  • Derek J. Weitzel
  • Brian Bockelman
  • David Swanson
  • Derek Weitzel
چکیده

A batch processing job in a distributed system has three clear steps, stage-in, execution, and stage-out. As data sizes have increased, the stage-in time has also increased. In order to optimize stage-in time for shared inputs, we propose the CacheD, a caching mechanism for high throughput computing. Along with caching on worker nodes for rapid transfers, we also introduce a novel transfer method to distribute shared caches to multiple worker nodes utilizing BitTorrent. We show that our caching method significantly improves workflow completion times by minimizing stage-in time while being non-intrusive to the computational resources, allowing for opportunistic resources to utilize this caching method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Design of Scalable Peer-to-Peer Video Caching

Peer-to-Peer (P2P) video caching is a promising approach to accommodate asynchronous requests from cached content at individual peers. However, coherently managing a distributed, heterogeneous, dynamic and potentially large scale cache space is a challenging task. In particular, a key challenge is to effectively control the number of cached copies for popular streams in order to accommodate the...

متن کامل

CS380L Project Writeup: Distributed Completion Service

Task parallelism is difficult to implement in a distributed setting due to machine unreliability and communication latency. HTCondor, an existing distributed computation framework, is insufficient for addressing these shortcomings. In this report, we present a high level abstraction built on top of HTCondor called the Distributed Completion Service (DCS). The DCS uses multiple different methods...

متن کامل

Data Suciency for Queries on Cache Internal Accession Date Only

In distributed computing environments, replication of data provides improved availability, isolation between workloads with di erent characteristics, and improved performance through local access to data. The \real data" is server resident and by \local data" we refer to cached client data. We examine which data should be cached on behalf of a cached query. The minimum requirement for cached da...

متن کامل

Caching schemes for DCOP search algorithms

Distributed Constraint Optimization (DCOP) is useful for solving agent-coordination problems. Any-space DCOP search algorithms require only a small amount of memory but can be sped up by caching information. However, their current caching schemes do not exploit the cached information when deciding which information to preempt from the cache when a new piece of information needs to be cached. Ou...

متن کامل

Data Su ciency for Queries on Cache

In distributed computing environments, replication of data provides improved availability, isolation between workloads with di erent characteristics, and improved performance through local access to data. The \real data" is server resident and by \local data" we refer to cached client data. We examine which data should be cached on behalf of a cached query. The minimum requirement for cached da...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015